Spectral Clustering for Microsoft Netscan Data

نویسندگان

  • Anne Patrikainen
  • Marina Meilă
چکیده

We present the results of exploratory data analysis for a data set that consists of crossposting information for 89,687 newsgroups over a period of 3.4 years. The data set we use is a part of Microsoft Netscan data. Our goal is to investigate the community structure of the newsgroup data set with a specific focus on spectral hierarchical clustering. We present a spectral hierarchical clustering algorithm and discuss existing and novel ways to measure the quality of a hierarchical clustering. We construct spectral hierarchical clusterings for ten subsets of the data set and evaluate the stability of the results.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Clustering Algorithm for Network Constraint Trajectories

Spatial data mining is an active topic in spatial databases. This paper proposes a new clustering method for moving object trajectories databases. It applies specifically to trajectories that only lie on a predefined network. The proposed algorithm (NETSCAN) is inspired from the wellknown density based algorithms. However, it takes advantage of the network constraint to estimate the object dens...

متن کامل

Clustering Spectral Filters for Extensible Feature Extraction in Musical Instrument Classification

We propose a technique of training models for feature extraction using prior expectation of regions of importance in an instrument’s timbre. Over a dataset of training examples, we extract significant spectral peaks, calculate their ratio to fundamental frequency, and use kmeans clustering to identify a set of windows of spectral prominence for each instrument. These windows are used to extract...

متن کامل

Restricted Boltzmann Machines with Gaussian Visible Units Guided by Pairwise Constraints

Restricted Boltzmann machines (RBMs) and their variants are usually trained by contrastive divergence (CD) learning, but the training procedure is an unsupervised learning approach, without any guidances of the background knowledge. To enhance the expression ability of traditional RBMs, in this paper, we propose pairwise constraints restricted Boltzmann machine with Gaussian visible units (pcGR...

متن کامل

Assessment of the Performance of Clustering Algorithms in the Extraction of Similar Trajectories

In recent years, the tremendous and increasing growth of spatial trajectory data and the necessity of processing and extraction of useful information and meaningful patterns have led to the fact that many researchers have been attracted to the field of spatio-temporal trajectory clustering. The process and analysis of these trajectories have resulted in the extraction of useful information whic...

متن کامل

Simultaneous spectral analysis of multiple video sequence data for LWIR gas plumes

We consider the challenge of detection of chemical plumes in hyperspectral image data. Segmentation of gas is difficult due to the diffusive nature of the cloud. The use of hyperspectral imagery provides non-visual data for this problem, allowing for the utilization of a richer array of sensing information. We consider several videos of different gases taken with the same background scene. We i...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005